6 research outputs found

    Learning from Positive and Unlabeled Examples

    Get PDF
    International audienceIn many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples, that is examples of the target class, are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, only use examples in order to evaluate statistical queries (SQ-like algorithms). Kearns designed the Statistical Query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimates for probabilities over the set of positive instances) and instance statistical queries (estimates for probabilities over the instance space). We prove that any class learnable in the Statistical Query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any target concept ff can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. The case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class remains open. This problem is challenging because it is encountered in many real-world applications

    Learning from Positive and Unlabeled Examples

    No full text
    In many machine learning settings, examples of one class (called positive class) are easily available. Also, unlabeled data are abundant

    Positive and Unlabeled Examples Help Learning

    No full text
    International audienceIn many learning problems, labeled examples are rare or expensive while numerous unlabeled and positive examples are available. However, most learning algorithms only use labeled examples. Thus we address the problem of learning with the help of positive and unlabeled data given a small number of labeled examples. We present both theoretical and empirical arguments showing that learning algorithms can be improved by the use of both unlabeled and positive data. As an illustrating problem, we consider the learning algorithm from statistics for monotone conjunctions in the presence of classification noise and give empirical evidence of our assumptions. We give theoretical results for the improvement of Statistical Query learning algorithms from positive and unlabeled data. Lastly, we apply these ideas to tree induction algorithms. We modify the code of C4.5 to get an algorithm which takes as input a set LAB of labeled examples, a set POS of positive examples and a set UNL of unlabeled data and which uses these three sets to construct the decision tree. We provide experimental results based on data taken from UCI repository which confirm the relevance of this approach

    Clinical features and prognostic factors of listeriosis: the MONALISA national prospective cohort study

    No full text
    corecore